2. Random variable, stochastic process and simulation.¶
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import scipy as sp
# We will be using random numbers from numpy random library
from numpy.random import normal, choice, uniform
import holoviews as hv
from holoviews import opts
hv.notebook_extension("plotly")
2.1. Random numbers, probability distribution and infernece libraries.¶
The numpy.random has the fastest random number generators that are based on low level code written in C.
The Scipy.stats has an extensive library of statistical distributions and tools for statistical analysis.
The Statsmodels Enhancing Scipy functionality with more tools for stat snslysis
The Seaborn library that enhanced matplotlib functionality for easy stat visualization.
2.2. General overview of random numbers in python¶
First we take a look at most widely used random numbers of numpy also called standard random numbers. These are rand (uniform random number on interval 0,1) and randn (stnadard normal random number with 0 mean and 1 variance). Useful Tip: when writing any code withr andom numbers is pay attention to the seed. If you want to test your code use the same seed to check for reproducability.
#np.random.seed(8376743)
np.random.rand(10) # Generates standard uniform random numbers U(0,1)
array([0.00383114, 0.68098948, 0.72180594, 0.96673167, 0.42840882,
0.06162404, 0.32489371, 0.86804435, 0.34075522, 0.21707998])
#plt.plot(np.random.rand(1000), 'o', color='green', alpha=0.5)
plt.hist(np.random.rand(10000),20)
plt.ylabel('Counts')
plt.xlabel('Bins')
Text(0.5, 0, 'Bins')
#np.random.seed(18493)
np.random.randn(10) # Generates standard random numbers N(0,1)
array([ 1.07857319, 2.09977617, 0.4268357 , -0.75823338, -0.07690082,
-1.10798635, -0.35363468, 1.11080042, -0.2035376 , 1.57529139])
#plt.plot(np.random.randn(1000), 'o', alpha=0.5)
plt.hist(np.random.randn(1000))
(array([ 14., 30., 71., 158., 233., 238., 141., 74., 32., 9.]),
array([-3.08143831, -2.46019683, -1.83895536, -1.21771389, -0.59647242,
0.02476906, 0.64601053, 1.267252 , 1.88849348, 2.50973495,
3.13097642]),
<BarContainer object of 10 artists>)
Parametraized random number allow you to set parameters like value of mean and interval length and therefore can be viewed as generalized versions of standard random numbers. Below we take a look at examples of continuous (random and unofrm RVs) and discrete random numbers generated by paratmerized distributions.
np.random.uniform(low=-1, high=1, size=(3, 4))
array([[ 0.74798007, -0.30135745, -0.84996944, -0.61404309],
[-0.78237279, -0.55931202, 0.43973905, 0.71333747],
[-0.37632603, 0.65483862, -0.02960389, -0.40676189]])
np.random.normal(loc=8, scale=10, size=(4, 4))
array([[11.72055658, 6.05639092, -6.79557925, 23.89280217],
[11.01446751, 16.40218783, 11.26136175, 2.43549872],
[ 0.657569 , 17.73375805, 7.002535 , 4.68432098],
[11.6757606 , 24.70834915, -3.2582135 , -1.70771326]])
np.random.binomial(n=10, p=0.6, size=20) # This one is Binomial distributions. You can see it is discrete as exepcted.
array([6, 5, 6, 5, 4, 7, 8, 3, 5, 4, 6, 4, 7, 2, 6, 4, 5, 4, 7, 5])
2.2.1. Using random numbers to get answers via simulations¶
One of the major uses of random numbers is for conducting numerical simulations. What is a simulation? It is a recreation of a process on a computer. And this recreation is done by random numbers. E.g to simulate coint tosses, die throws, diffusion of molecules, conformational change of polymers we use random number to recreate the process on a computer. Let’s start off by asking some simple questions
How often do we get a run of 5 or more consecutive heads in 100 coin tosses if we repeat the experiment 1000 times?
What if the coin is biased to generate heads only 40% of the time?
L = 100 # length of each trajectory
N = 1 # number of experiments: stochastic trajecotries generated
xs = np.random.choice([0,1], (L, N)) # (i) Unbiased coin p=[0.5,0.5] by default
ys = np.random.choice([0,1], (L, N), p=[0.9, 0.1]) # (ii) biased coin
fig, ax = plt.subplots(ncols=2)
ax[0].plot(xs,'o',alpha=0.5,label = 'data1')
ax[0].plot(ys,'o',alpha=0.5, label = 'data2')
ax[1].hist((xs[:,0], ys[:,0]), 2, label = ("data1", "data2"))
ax[0].legend()
ax[1].legend()
<matplotlib.legend.Legend at 0x7f2424c62250>
np.random.seed(12738)
# Non-fancy way of simulating the answer
runs = 0
for x in xs: # plug either xs or ys
m = 0
for i in x:
if i == 1:
m += 1
if m >=5:
runs += 1
break
else:
m = 0
runs
0
2.2.2. Simple, discrete, unbiased random walk in 1D.¶
Let’s start off by simulating a random walk outlined in class by using a random number generators from numpy called choice and normal. Type help(np.random.choice) and help(np.random.normal) to learn more.
numpy.random.choice(A,size=(M,N,K))
returns element from a list of objects, A with uniform and equal probabilities (default) or with unqeual probabilities once supplied a list. Returend array can be a 1D, 2D or 3D aray of shape M,N,K.
choice(['bagel','muffin','croissant'], size=(3,5))
array([['bagel', 'croissant', 'muffin', 'muffin', 'croissant'],
['muffin', 'croissant', 'bagel', 'croissant', 'bagel'],
['croissant', 'croissant', 'muffin', 'croissant', 'bagel']],
dtype='<U9')
The numpy.random.normal(mu,sigma,size=(M,N,K)) retruns random number distributed according to gausian probability function in the form of 1D, 2D or 3D arays of length N,M,K
# Here we generate 3 sequences of normally distributed random variables of length 5
normal(0,2,(3,5))
array([[-2.69161384, -1.76025845, 2.6359338 , -1.27629114, 0.28905087],
[ 1.10431222, -2.35128534, 1.62603244, -0.08737834, 0.95731216],
[-0.47182587, -0.10790324, 0.85141111, 0.9185157 , -0.58355616]])
2.2.3. Simulating a 1D unbiased random walk¶
In the course of simulating random walks we will be genreating multidimensional numpy arrays. We will adhere to a convnetion that:
Rows are regarded as number of measurments, or samples
Columns are regarded as number of observables distinct measruments/trajectories
def rw_1d(L, N=1):
'''1d random walk function:
L: trajectory length
N: Number of trajecotry
'''
rw = choice([-1,1], size=(L, N))
rw_sum=rw.cumsum(axis=0)
rw_sum[0,:]=0
return rw_sum
rw_sum = rw_1d(1000, 100)
plt.plot(rw_sum, alpha=0.7)
plt.title(f'{N} random walks', fontsize=20)
plt.xlabel(r'$Steps$',fontsize=20)
plt.ylabel(r'$Position$',fontsize=20)
plt.grid('--')
import pandas as pd
rw_data = pd.DataFrame(rw_sum)
rw_data
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | -2 | -2 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | -2 |
| 2 | -1 | 3 | 3 | 1 | 1 | -1 | 1 | -1 | 1 | 1 | ... | -1 | -3 | 1 | 3 | -1 | -1 | 1 | 1 | -1 | -3 |
| 3 | 0 | 2 | 4 | 0 | 2 | -2 | 0 | 0 | 2 | 2 | ... | 0 | -4 | 2 | 4 | -2 | 0 | 0 | 2 | -2 | -4 |
| 4 | 1 | 1 | 5 | -1 | 3 | -3 | 1 | -1 | 1 | 1 | ... | -1 | -5 | 1 | 5 | -1 | -1 | 1 | 3 | -3 | -3 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 995 | 20 | 34 | 44 | -26 | 0 | -52 | 8 | 40 | 30 | -24 | ... | 50 | 2 | 20 | -36 | 26 | 56 | 30 | -16 | -6 | 12 |
| 996 | 21 | 33 | 43 | -27 | -1 | -53 | 9 | 41 | 31 | -25 | ... | 51 | 1 | 19 | -37 | 25 | 55 | 31 | -17 | -7 | 13 |
| 997 | 22 | 34 | 42 | -26 | 0 | -52 | 10 | 42 | 30 | -26 | ... | 52 | 0 | 18 | -38 | 24 | 56 | 30 | -18 | -6 | 12 |
| 998 | 21 | 35 | 43 | -27 | 1 | -53 | 11 | 41 | 31 | -27 | ... | 51 | 1 | 17 | -37 | 25 | 55 | 29 | -17 | -5 | 11 |
| 999 | 20 | 36 | 44 | -26 | 0 | -52 | 10 | 40 | 32 | -28 | ... | 52 | 0 | 18 | -38 | 26 | 56 | 30 | -16 | -6 | 10 |
1000 rows × 100 columns
plot = hv.Curve(rw_data[0].values)
for i in range(1,50):
plot*=hv.Curve(rw_data[i].values)
plot #+ hv.Histogram(np.histogram(rw_data[4][500:]))
2.2.4. Statistics of random walk¶
from scipy.stats import norm # we need this to fit to normal distribution
fig, ax = plt.subplots(1, 5, figsize=(9, 3), sharey=True)
L=100
N=1000
rw_sum = rw_1d(L,N)
for i in range(5):
t = int(i*(L/5)) + 5 # record dist at 5 equidist points
ax[i].hist(rw_sum[t,:], density=True) # generate histogram at different points
# Fit histogram to normal dist
xmin, xmax = ax[i].get_xlim()
x = np.linspace(xmin, xmax, 100)
y = norm.pdf(x, 0, np.sqrt(t))
ax[i].plot(x,y)
ax[i].set_title(f"t = {t}")
2.2.5. Connection with diffusion: Mean square displacement¶
# Determine the time evolution of the mean square distance.
plt.plot(np.mean(rw_sum**2, axis=1), '-o') # why only rw**2 and not mean(rw-mean(rw))**2 ?
plt.title('Compute mean square deviation of 1D random walker',fontsize=15)
plt.xlabel('Number of steps, n',fontsize=15)
plt.ylabel(r'$MSD(n)=\langle x(n)^2 \rangle$',fontsize=15);
# Questions:
# (i) Compute MSD for @D random walk show the expected scaling
# (ii) Generate random walks from different positions
# (iii) choice between [-1,1]
# (iv) Multinomial choice
# (v) Self avoiding walks
def rw_2d(L, N=1):
'''2d random walk function:
L: trajectory length
N: Number of trajecotry
'''
verteces = np.array([(1,0),(0,1),(-1,0),(0,-1)])
rw = verteces[choice([0,1,2,3], size=(L, N))]
rw[0, :, :] = 0
return rw.cumsum(axis=0)
verteces = np.array([(1,0),(0,1),(-1,0),(0,-1)])
rw = verteces[choice([0,1,2,3], size=(L, N))]
%%time
rw_sum = rw_2d(1000,5)
rw_sum.shape
CPU times: user 1.11 ms, sys: 64 µs, total: 1.17 ms
Wall time: 856 µs
(1000, 5, 2)
plt.plot(rw_sum[:,:,0], rw_sum[:,:,1])
plt.plot('')
[<matplotlib.lines.Line2D at 0x7f2423bfffd0>]
2.2.6. Binomial distribution of a random walker¶
In previous example we started with random variables with no reference to probability distribution. This time we will generate random numbers from a binomial distribution.
This time we will use scipy.stats library which contains probability distirbution functions. That is in addition to generating random variables we can also compute probability distributions and related quantities analytically.
from scipy.stats import binom, norm, poisson
s = binom(10, 0.5) # Let us declare s to be a binomial RV
print(s.rvs(20)) # 20 random samples form X
print(s.pmf(5)) # P(X = 5)
print(s.cdf(5)) # P(X <= 5)
print(s.mean()) # E[X], mean
print(s.var()) # Var(X), variance
print(s.std()) # Std(X), standard deviation
[6 6 6 4 6 7 6 8 3 4 2 2 4 7 6 7 4 5 4 3]
0.24609375000000025
0.6230468749999999
5.0
2.5
1.5811388300841898
def coin_flip(L,p,N):
'''
L: flip coint L times
q: with p probability
N: Repeat experiment N times
'''
coin = binom(L, p) # Binomial RV
coin_flips = coin.rvs(N) # Generate sample of N points
coin_pmf = coin.pmf(np.arange(L+1)) # Generate PMF from 0 to L
coin_cdf = coin.cdf(np.arange(L+1))
return coin_flips, coin_pmf, coin_cdf
fig, ax=plt.subplots(nrows=1, ncols=3,figsize=(12,3))
#Coin flip 1 time.
X1, P1, CP1 = coin_flip(20, 0.5, 80)
ax[0].plot(X1,'--s',color='purple')
ax[1].hist(X1,density=True)
ax[1].plot(P1,'-o')
ax[2].plot(CP1,'-d',color='red')
[<matplotlib.lines.Line2D at 0x7f2423b18910>]
2.3. Continuous time random walk: Brownian motion¶
We have learned about random variables, random walk and have encorunetred a concept of stochastic process on the example of discrete step 1D random walk. Now let us generate the prototypical stochasic process in continuous time; the brownian motion. Brownian motion was first discoverd by a botanist Brown who noticed that a pollen in solution undergo erratic and incessant motion. To simulate brownian motion we take the continuous time limit of random walk and approximate dsiplacements of our particle as normally distributed (binomial->normal, time step->continuous time)
We assume we have started at position \(\mu=0\) and our variance is given by \(\sigma^2=2Dt\) Where D is the diffusion coefficnets which is related to parameters of discree random walk as shown in the lecture.
In the last step we re-wrote brownian motion equation in a convenient way by shifting normally distributed radnom variable by \(\mu\) and scaling it by \(\sigma\)
$\(N(\mu, \sigma^2)=\mu+\sigma N(0,1)\)$.
def brown(T, N=1, dim=1, dt=1, D=1):
"""
Creates 2D brownian path given:
time T
N=1 trajecotires
dim=2: 2D by default
dt=1 timestep
D=1 diffusion coeff
"""
nT = int(T/dt) # how many points to sample
dR = np.sqrt(2*D*dt) * np.random.randn(N, nT, dim) # 3D position of brownian particle
R = np.cumsum(dR, axis=1) # accumulated 3D position of brownian particle
return R
2.3.1. Brownian motion in 1D¶
R=brown(1000, N=50,dim=1)
R.shape
(50, 1000, 1)
[plt.plot(R[i, :, :]) for i in range(3)];
2.3.2. Brownian motion in 2D/3D¶
R=brown(1000,N=10,dim=2)
print(R.shape)
print('')
print(len(R))
(10, 1000, 2)
10
R=brown(500, N=10, dim=2)
hv.extension('plotly')
curve = hv.Curve(R[0,:,:])
for i in range(1,len(R)):
curve*=hv.Curve(R[i,:,:])
curve.opts(width=600, height=600)
R=brown(500, N=10, dim=3)
hv.extension('plotly')
hv.Path3D( R[i,:,:] for i in range(10) ).opts(width=800,height=800, line_width=3)
2.3.3. Statistics of Brownian motion¶
R = brown(1000, N=10000, dim=1)
R.shape
(10000, 1000, 1)
[plt.hist(R[:,i,:], alpha=0.3, density=True) for i in range(5)];
2.3.4. Diffusion Equation¶
The movement of individual random walkers \(\leftrightarrow\) density of walkers \(\rho(\vec{r},t)\)
Diffusion equation:
Formulated empirically as Fick’s laws
This is a 2nd order PDE! Unlike equations of motion diff eq shows irreersibile behaviour
This one exactly solvable. But in general reaction-diffusion PDEs difficult to solve analytically.
Can solve numerically by writing derivatives as finite differences!
Can also simulate via random walk!
Diffusion coefficient \(D\), Units \([L^2]/[T]\)
Important special case solution (here written in 1d):
where \(\sigma(t)=\sqrt{2{D}t}\)
density remains Gaussian
Gaussian becomes wider with time
check that this is indeed a solution by plugging into the diffusion equation!
import ipywidgets as widgets
# time dependent sigma
def sigma(t, D = 1):
return np.sqrt(2*D*t)
# density function is gaussian
def gaussian(x, t):
return 1/np.sqrt(2*np.pi*sigma(t)**2) * np.exp(-x**2/(2*sigma(t)**2)) #
@widgets.interact(t=(0.001,1,0.001))
def diffusion(t=0.001):
x = np.linspace(-4, 4, 1000)
plt.plot(x, gaussian(x, 0.001), '--', color='orange')
plt.plot(x, gaussian(x, t), lw=3, color='green')
2.4. References¶
The mighty little books
More in depth
On the applied side